Methods and apparatus for generating effective test code for out of order superscalar microprocessors

ABSTRACT

A technique for producing a test executable in a computer. The technique involves forming multiple instruction streams. The technique further involves dividing the multiple instruction streams into portions, and generating a combined instruction stream having the portions interleaved. Additionally, the technique involves creating a test executable from the combined instruction stream. The test executable can be used for testing a simulated processor in a computer. In particular, the test executable is loaded. Then, the test executable is run through the simulated processor to generate processor results and through a reference model to generate reference results. The processor results and the reference results are compared to determine whether the simulated processor operates correctly.

RELATED APPLICATIONS

[0001] This application is a continuation of U.S. application Ser. No.09/106,691, filed Jun. 29, 1998. The entire teachings of the aboveapplication is incorporated herein by reference.

BACKGROUND OF THE INVENTION

[0002] The process of designing a data processor typically includestesting for design flaws at various stages of development. Such testingoften involves running one or more test executables through a processorsimulation system during a processor simulation stage of development, orthrough an actual processor in semiconductor form after a fabricationstage. In general, these test executables attempt to stress particularcircuits and features of the process.

[0003] A superscalar processor is a processor that is capable ofexecuting multiple instructions simultaneously. Such processorstypically include an execution stage having multiple execution units(execution circuits), each of which can execute an instructionindependently of other execution units. Designers typically testsuperscalar processors using test executables created from source codehaving few or no instruction dependencies, or source code having weakinstruction dependencies.

[0004] An instruction dependency (also referred to as a data hazard)exists when two instructions attempt to access the same register. Thestrongest type of instruction dependency is a read-after-write (RAW)dependency in which an initial instruction writes a result to a registerand a subsequent instruction reads the result from that register. Thesubsequent instruction must wait until the initial instruction completeswriting the result before it can read the result. The weakest type ofinstruction attempting to read from the same register. Other types ofinstruction dependencies include write after-read (WAR) andwrite-after-write (WAW) dependencies.

[0005] Instruction streams with weak instruction dependencies or noinstruction dependencies stress the multiple execution capabilities ofsuperscalar processors since there is little or no need to delay theinstructions of such streams. Accordingly, instructions generally canexecute as soon as an execution unit becomes available.

[0006] Stream #1, as shown below, includes no instruction dependencies,and stresses the multiple issue feature of superscalar processors.STREAM #1 Inst. #1 OP SRC 0 SRC 1 DEST. 1 addq R01, R02, R03 2 subl R04,R05, R06 3 addq R07, R08, R09 4 subl R10, R11, R12

[0007] Instruction 1 adds the contents of source register R01 to sourceregister R02, and stores the result in destination 35 register R03.Instruction 2 subtracts the contents of R04 from R05, and stores theresult in R06. Instruction 3 adds the contents of R07 to register R08,and stores the result in R09. Instruction 4 subtracts the contents ofR10 from R11, and stores the result in R12. Since none of theinstructions access the same registers, there are no instructiondependencies. Accordingly, subsequent instructions do not need to bedelayed while earlier instructions complete, and instructions may issueas as execution units become available to execute them. As a result, theexecution units of the superscalar processor are consistently kept busy.For these reasons, designers of superscalar processors often createlarge executables similar to Stream #1, and use such executables to testthe superscalar capabilities of their processor designs.

[0008] Another type of processor is called an out-of-order processor. Anout-of-order processor is a processor that obtains instructions in aprogram order, and that is capable of executing instructions in an orderthat is different than the program order (i.e., capable of executinginstructions out-of-order). Such processors typically include an issuequeue that queues the instructions obtained in program order, and thatis capable of issuing instructions out-of-order when instructiondependencies require that the processor delay issuance of instructionsnext in line. Designers typically test outof-order processors using atest executable created from source code having a large number ofinstructions with strong dependencies.

[0009] Stream #2 includes instructions with strong dependencies, andstresses the out-of-order issue feature of out-of-order processors.STREAM #2 Inst. # OP SRC 0 SRC 1 DEST. 1 addq R01, R02, R03 2 subl R03,R04, R05 3 addq R03, R06, R07 4 subl R08, R09, R10

[0010] Instruction 1 adds the contents of source register R01 to sourceregister R02, and stores the result in destination register R03.Instruction 2 subtracts the contents of R03 from R04, and stores theresult in R05. Instruction 3 adds the contents of R03 to R06, and storesthe result in R07. Instruction 4 subtracts the contents of R08 from R09,and stores the result in R10. Since Instruction 1 stores its result inR03 and each of the Instructions 2 and 3 reads from R03, Instructions 2and 3 having instruction dependencies with Instruction 1. Accordingly,Instructions 2 and 3 cannot issue until Instruction 1 stores its result.

[0011] In contrast, Instruction 4 can issue at any time relative toInstructions 1, 2 or 3 since Instruction 4 does not access any registersthat are accessed by the other instructions. Accordingly, anout-of-order processor may issue Instruction 1, and subsequently issueInstruction 4 prior to issuing Instructions 2 and 3. For these reasons,designers of out-of-order processors often create large executables frominstruction streams similar to Stream #2 to cause instructions to issueout-of-order, and then use such executables to stress the out-of-ordercapabilities of their processor designs.

[0012] Some processors include both superscalar and out-of-orderfeatures. The superscalar feature of such a processor can be tested byrunning a test executable having instructions without dependenciessimilar to that of Stream#1 (shown above). Additionally, theout-of-order feature can be tested by running another test executablehaving instructions with dependencies similar to that of Stream #2(shown above).

SUMMARY OF THE INVENTION

[0013] Stream #1, shown above, may stress a processor's superscalarcapabilities, but does not stress the processor's out-of-ordercapabilities simultaneously. Similarly, Stream #2, shown above, maystress a processor's out-of-order capabilities, but does not stress theprocessor's superscalar capabilities simultaneously. Unfortunately, manydesign problems in complex processors will only be discovered whenmultiple processor features are stressed simultaneously.

[0014] A stream suitable for testing a processor's superscalarcapabilities with few or no dependencies (e.g., Stream #1 above) can bemodified by introducing strong instruction dependencies, e.g.,read-after-write (RAW) dependencies. However, increasing the number ofRAW instruction dependencies reduces the number of independentinstructions (instructions without dependencies) within the stream. Thatis, the resulting stream may improve the stream's opportunity to causean out-of-order execution, but such a stream may no longer be able toconsistently stress the superscalar structures of the processor.Accordingly, some execution units may become idle and the throughput ofthe processor will decrease.

[0015] An embodiment of the invention is directed to a technique thatcan produce, in a computer, a test executable that can simultaneouslytest the superscalar and out-of-order capabilities of a processor. Thetechnique involves forming multiple instruction streams, dividing themultiple instruction streams into portions, and generating a combinedinstruction stream having the portions interleaved. The techniquefurther involves creating a test executable from the combinedinstruction stream.

[0016] Formation of multiple instruction streams preferably involvesconstructing the multiple instruction streams such that the multipleinstruction streams access different groups of registers. Eachinstruction stream can provide instructions with strong dependencies fortesting the out-of-order capabilities of the processor. Additionally,the instructions within any particular stream are independent of theinstructions of the other streams such that multiple execution units ofthe processor can be consistently kept busy.

[0017] Construction of the multiple instruction streams may involveoperating a code generator such that the code generator provides each ofthe multiple instruction streams. Alternatively, such construction mayinvolve operating a code generator such that the code generator providesa particular instruction stream, and forming other instruction streamsaccording to the particular instruction stream.

[0018] To divide the streams into portions and generate a combinedinstruction stream having the stream portions, the technique may involveinterleaving the portions within the combined instruction stream suchthat the portions alternate in a round-robin manner. Alternatively, thetechnique may involve interleaving the portions within the combinedinstruction stream such that the portions alternate in a pseudo randommanner. Interleaving in a pseudo random manner may introduce nuanceswithin the instruction stream that uncover design flaws that wouldotherwise be undetected.

[0019] Additional nuances within the instruction stream can beintroduced in other ways, as well. In particular, the technique mayfurther involve, prior to creating the test executable, includingconflict instructions (e.g., instructions that cause conflicts) withinthe combined instruction stream. For example, LOAD instructions thatcause cache misses may be included within the instruction stream topurposefully stall instructions with dependencies within the instructionstream. The LOAD instructions would more fully stress the processor'sout-of-order capabilities by adding delays to particular instructionsdepending on the LOAD instructions.

[0020] Furthermore, the formation of the multiple instruction streamsmay involve constructing the multiple instruction streams such that themultiple instruction streams communicate with each other. In particular,the multiple instruction streams can be formed such that they accesscommon registers. Additionally, the multiple instruction streams can beformed such that they share common memory spaces. The sharing of commonregisters or memory spaces enhances the breadth of the processor test byalso testing interstream communication aspects of the processor.

[0021] Another embodiment of the invention is directed to a simulationsystem for testing a simulated processor. The system includes an inputthat receives a test executable created from a combined instructionstream having interleaved portions of multiple instruction streams. Thesystem further includes a processor simulator, coupled to executable togenerate reference results. Furthermore, the system includes a comparemodule, coupled to the processor simulator and the reference model, thatcompares the processor results and the reference results to determinewhether the simulated processor operates correctly. The systemsimultaneously stresses the superscalar and out-of-the input processorreference that runs the test executable to generate results.Additionally, the system includes a model, coupled to the input, thatruns the test order capabilities of the processor simulator such thatdesign flaws can be detected and corrected prior to fabrication of theactual processor.

BRIEF DESCRIPTION OF THE DRAWINGS

[0022] The foregoing and other objects, features and advantages of theinvention will be apparent from the following more particulardescription of preferred embodiments of the invention, as illustrated inthe accompanying drawings in which like reference characters refer tothe same parts throughout the different views.

[0023] The drawings are not necessarily to scale, emphasis instead beingplaced upon illustrating the principles of the invention.

[0024]FIG. 1 is a block diagram of an apparatus for producing a testexecutable.

[0025]FIG. 2 is a flow diagram of a method for producing a testexecutable.

[0026]FIG. 3 shows multiple code streams that are formed by a multipleinstruction code stream generator circuit of FIG. 1.

[0027]FIG. 4 is a combined code stream that is generated by aninterleaves circuit of FIG. 1.

[0028]FIG. 5 is a block diagram of a simulation system for testing asimulated processor using a test executable that is created by theapparatus of FIG. 1.

[0029]FIG. 6A is a chart showing contents of an issue queue of FIG. 5after a first fetch of instructions.

[0030]FIG. 6B is a chart showing contents of the issue queue of FIG. 5after a second fetch of instructions.

[0031]FIG. 6C is a chart showing contents of the issue queue of FIG. 5after a third fetch of instructions.

[0032]FIG. 6D is a chart showing contents of the issue queue of FIG. 5after a fourth fetch of instructions.

[0033]FIG. 7 is a chart showing issue times for each of the instructionsof the combined code stream of FIG. 4 when executed by the simulationsystem of FIG. 5.

DETAILED DESCRIPTION OF THE INVENTION

[0034] An embodiment of the invention is directed to a technique forproducing a test executable that can stress both the superscalar andout-of-order capabilities of a processor design. The test executable iscreated from a combined instruction stream having interleaved portionsof multiple instruction streams.

[0035] Reference is now made to the drawings wherein the same referencenumbers are used throughout multiple figures to designate the same orsimilar components. FIG. 1 shows an apparatus for producing the testexecutable. The apparatus 20 includes a multiple instruction streamgenerator circuit 22, an interleaver circuit 24 and a compiler circuit26. As will now be explained, the circuits of the apparatus 20 perform amethod 50 as shown in FIG. 2.

[0036] In step 52, the multiple instruction stream generator circuit 22forms multiple instruction streams, and stores the instruction streamsin respective files 30 (e.g. lis files). The instruction streams accessdifferent groups of registers as indicated by configuration information28 that is received by the multiple instruction stream generator circuit22.

[0037] In step 54, the interleaver circuit 24 divides the multipleinstruction streams into portions, and generates a combined instructionstream having the portions interleaved. The combined instruction streamis stored within a file 32 (e.g., a mar file). The number ofinstructions in each portion is controlled by the configurationinformation 28.

[0038] In step 56, the compiler circuit 26 creates a test executablefrom the combined instruction stream. In particular, the compilercircuit compiles the combined instruction stream, and stores the testexecutable as a file 34 (e.g., a dxe file). Such an executable issuitable for execution by a simulated processor or an actual processor.

[0039] Further details of the multiple instruction stream generatorcircuit 22 will now be provided. The multiple instruction streamgenerator circuit includes a constructor circuit 36 and a storage device38 (e.g., disk memory), as shown in FIG. 1. The constructor circuit 36includes a code generator 36 and a control circuit 42. The controlcircuit 42 operates the code generator to form the multiple instructionstreams. Preferably, the control circuit 42 can run the code generator36 to produce a single instruction stream, makes multiple copies of thesingle instruction stream, and modifies the copies such that they accessdifferent groups of registers based on the configuration information 28.Alternatively, the control circuit 42 runs the code generator 40multiple times to form the multiple instruction streams that access thedifferent groups of registers. The constructor circuit 36 stores themultiple instruction streams within the files 30 in the storage device38.

[0040] It should be understood that the apparatus 20 is preferably ageneral purpose computer having code for producing the test executables.In particular, the code controls the general purpose computer such thatit functions at various times as the multiple instruction streamgenerator 22, the interleaver circuit 24 and the compiler circuit 26.Alternatively, the apparatus 20 may be a specialized apparatus designedspecifically to perform the method 50 of FIG. 2.

[0041] The operation of the apparatus 20 will be further explained byway of example. FIG. 3 shows four instruction streams (STREAM A, STREAMB, STREAM C and STREAM D) that can be formed by the multiple instructionstream generator circuit 22. The configuration information 28 that isused by the multiple instruction stream generator circuit 22 controlsparticular aspects of the multiple instruction streams such as thenumber of streams that are formed, their length (the number ofinstructions within each stream), which registers are accessed by eachinstruction stream, and the type of instructions within each instructionstream (e.g., load, add, shift, etc).

[0042] The instructions within STREAM A access a first group ofregisters, namely R01 through R08. The instructions followingInstruction 1 have strong dependencies on preceding instructions. Forexample, Instruction 1 writes to R01, and Instruction 2 reads from R01.Accordingly, Instruction 1 must complete writing to R01 beforeInstruction 2 can read from R01. In a similar manner, Instruction 3depends from Instructions 1 and 2, and so on.

[0043] STREAM B, STREAM C and STREAM D include instructions that arearranged in a manner similar to that of STREAM A, except that theseinstruction streams access different groups of registers. In particular,STREAM B accesses registers R09 through R16, STREAM C accesses registersR16 through R24, and STREAM D accesses registers R25 through R32. Eachinstruction stream formed by the multiple instruction stream generatorcircuit 22 is stored, at least temporarily, in the storage device 38 foruse by the interleaver circuit 24.

[0044]FIG. 4 shows a combined instruction stream that is generated bythe interleaver circuit 24 from the instruction streams shown in FIG. 3.The interleaver circuit 24 divides the instruction streams intoportions, and then interleaves the portions to generate the combinedinstruction stream. The configuration information 28 controls the sizeand ordering of the portions within the combined instruction stream. Asshown in FIG. 4, the first five instructions of the combined instructionstream are from a portion of STREAM A (see FIG. 3). Similarly, the nextfive instructions of the combined instruction stream are from a portionof STREAM B, and so on.

[0045] The manner of interleaving is based on the configurationinformation 28. In particular, the interleaver circuit 24 can generatethe combined instruction stream such that it cycles through portions ofSTREAM A, STREAM B, STREAM C and STREAM D. Such an arrangement ofportions is considered to be a round-robin ordering of the portions.Alternatively, the interleaver circuit 24 can generate the combinedinstruction stream such that it includes portions of the streams in apseudo random order.

[0046] After the interleaver circuit 24 generates the combinedinstruction stream, the compiler circuit 26 compiles the combinedinstruction stream to create a test executable that is suitable forexecution on either a simulated processor or an actual processor.

[0047] Another embodiment of the invention is directed to a simulationsystem 60 that is suitable for executing the created test executable 34.As shown in FIG. 5, the simulation system 60 includes a simulationdevice 62 that receives a test executable 64 (e.g., executable code suchas the test executable 34 of FIG. 1) and environment information 66(e.g., a dxe file), simulates execution of the test executable, andprovides results 68 of the execution (e.g., a log file).

[0048] As shown in FIG. 5, the simulation device 62 includes a processorsimulator module 70, a reference model module 72, a system ormotherboard simulator module 76, and a compare module 74. The processorsimulator module 70 operates according to processor design informationand is connected with the system or motherboard simulator module 76which simulates environmental conditions (e.g., provides external clockrates).

[0049] During simulation, the test executable 64 is executed by both theprocessor simulator module 70 and the reference model module 72. Theprocessor simulator module 70 includes a simulated issue queue 78, and asimulated execution stage 80 having multiple simulated execution unitsand processor registers. As the processor simulator module 70 executesthe test executable 64, results of the execution are passed to thecompare module 74. Similarly, the reference model module 72 determineswhat the correct results of execution should be, and passes the correctresults to the compare module 74. The compare module 74 matches theresults from both the processor simulator module 70 and the referencemodel 72, and points out discrepancies in the results as an error output68 (e.g., the log file).

[0050] The operation of the simulation system 60 will be describedfurther by way of example. This example involves testing a superscalarout-of-order processor that is capable of speculatively issuing andexecuting instructions. FIGS. 6A through 6D show the contents of thesimulated issue queue 78 of the processor simulator module 70, after theoccurrence of various multi-instruction fetches of the test executable34 (i.e., the test executable created by compiling the combined codestream of FIG. 4). In particular, in FIG. 6A, the simulated issue queue78 loads the first four instructions of the test executable 34 during aninitial processor cycle (time 0). Since Instruction 1 is the firstinstruction and does not depend on any other instruction, Instruction 1is free to issue. However, Instructions 2, 3 and 4 cannot issue due totheir RAW dependencies with Instruction 1. Accordingly, during the nextprocessor cycle (time 1), only Instruction 1 will issue (indicated bythe rectangle around Instruction 1).

[0051] As shown in FIG. 6B, Instruction 1 is removed from the simulatedissue queue 78 after issuing in time 1. Additionally, the remainingthree instructions are advanced in their queue positions, and the nexttour instructions of the test executable 34 are fetched and loaded intothe simulated issue queue 78. At this point, Instruction 2 can issuesince the simulated processor is capable of issuing instructionsspeculatively. However, Instructions 3, 4 and 5 cannot issue since theyhave RAW dependencies with Instruction 2. Instructions 6, 7 and 8 do notdepend on these previously fetched instructions. Rather, Instruction 6has no dependencies, and Instructions 7 and 8 have RAW dependencies withInstruction 6. Although Instruction 6 can issue, Instructions 7 and 8cannot issue because of their instruction dependencies. Accordingly, inthe next processor cycle (time 2), Instructions 2 and 6 issue while theother queued instructions must wait.

[0052] At this point, it should be understood that the test executable34 has begun to stress both the superscalar and out-of-ordercapabilities of the processor simultaneously. In particular, twoinstructions (Instructions 2 and 6) have issued for simultaneousexecution to test the simulated processor's superscalar feature.Additionally, Instruction 6 (stored in issue queue position 5) is issuedout-of-order to test the simulated processor's out-of-order feature.

[0053] As shown in FIG. 6C, Instructions 2 and 6 are removed from thesimulated issue queue 78 after issuing in time 2. Additionally, theremaining instructions are advanced in their queue positions, and thenext four instructions of the test executable are fetched and loadedinto the simulated issue queue 78. Instructions 3 and 7 can issue sincethe processor supports speculative execution. Additionally, Instruction11 has no dependencies and can issue. The rest of the instructions haveRAW dependencies with other instructions in the issue queue and mustwait. Accordingly, Instructions 3, 7 and 11 issue simultaneously in thenext processor cycle (time 3). As shown in FIG. 6C, three instructionsare issued from various positions within the issue queue 78 such thatboth the superscalar and out-of-order features of the simulatedprocessor are stressed.

[0054] As shown in FIG. 6D, Instructions 3, 7 and 11 are removed fromthe simulated issue queue 78 after issuing in time 3. Furthermore, theremaining instructions are advanced in their queue positions, and thenext four instructions of the test executable are fetched and loadedinto the simulated issue queue 78. Instructions 4, 8 and 12 can issue ifthe processor supports speculative execution. Additionally, Instruction16 has no dependencies and can issue. The rest of the instructions haveRAW dependencies with other instructions in the issue queue and mustwait. Accordingly, Instructions 4, 8, 12 and 16 issue simultaneously inthe next processor cycle (time 4). It should be understood that fourinstructions are issued from various positions within the issue queue 78such that both the superscalar and out-of-order features of thesimulated processor are further stressed.

[0055] It should be clear from a comparison of FIGS. 6A through 6D thatexecution of the test executable 34 results in instructions issuing froma variety of different locations within the issue queue 78. Accordingly,the out-of-order capabilities of the processor are well tested.

[0056] In some processors, the issue queue receives instructions at afirst end, and scans for instructions to issue beginning at the oppositeend. For such processors, the instructions migrate from the first end ofthe issue queue to the opposite end. The test executable 34 is wellsuited for testing such a processor. In particular, queued instructionsissue from positions throughout the issue queue as they migrate from thefirst end of the issue queue to the opposite end.

[0057]FIG. 7 shows the instructions within test executable created fromthe combined instruction stream of FIG. 4, with their respective fetch(F), issue (I), execute (E) and retire (R) times. As illustrated,multiple instructions issue and execute simultaneously and out-of-orderthereby stressing the superscalar and out-of-order capabilities of thesimulated processor. Similar results occur when running the testexecutable 34 on an actual processor.

[0058] Furthermore, the test executable can run on a processor withoutspeculative execution capabilities. In this situation, more fetches mustoccur to further fill the issue queue with instructions withoutdependencies before the processor's superscalar capabilities arestressed. Otherwise, the processor behaves in a manner similar to thatabove for a processor capable of issuing and executing speculatively.

[0059] Equivalents

[0060] While this invention has been particularly shown and describedwith references to preferred embodiments thereof, it will be understoodby those skilled in the art that various changes in form and details maybe made therein without departing from the spirit and scope of theinvention as defined by the appended claims.

[0061] For example, special instructions (e.g., instructions that causeconflicts) may be inserted within the combined instruction streamsgenerated by the interleaver circuit 24 to cause various situations tooccur. For example, LOAD instructions may be inserted within thecombined instruction stream such that cache misses occur duringexecution. This would more fully stress the processor's out-of-ordercapabilities. As an alternative to inserting conflict instructionswithin the combined instruction stream, the conflict instructions canreplace instructions within the combined instruction stream. Suchinsertions or replacements can be controlled by setting parameterswithin the configuration information 28.

[0062] Additionally, the groups of registers can be modified such thatthe different groups overlap. For example, STREAM A and STREAM B can beformed such that both instruction streams access register R08. Such amodification provides an opportunity for inter-stream communication.Another way of adding inter-stream communication is to make multiplestreams access overlapping memory spaces. Such features can becontrolled by setting parameters within the configuration information28.

[0063] Some processors treat registers identified within instructions aslogical registers, and internally map the logical registers ofinstructions to physical registers. This operation is called registerrenaming. The test executable produced by the above described techniqueis suitable for testing such processors. In particular, running the testexecutable on such a processor would stress that processor's renamingfeatures simultaneously with its superscalar and out-of-order features.To enhance testing of the register renaming capabilities of theprocessor, more instruction streams should be added or the differentregister groups should be widened such that each logical register isaccessed by at least one instruction stream.

[0064] Furthermore, it should be understood that particular aspects ofthe combined instruction stream can be changed. For example, the numberof instruction streams formed by the multiple instruction streamgenerator circuit 22 can be more or less than four (as shown in theexample of FIG. 3). Similarly, the instruction types and the lengths ofthe instruction stream portions can be changed as well. Accordingly,processor designers can produce multiple test executables that stressvarious combinations of particular processor features, at differenttimes.

What is claimed is:
 1. A method for testing a simulated processor in acomputer, comprising the steps of: loading a test executable createdfrom a combined instruction stream having interleaved portions ofmultiple instruction streams; running the loaded test executable throughthe simulated processor to generate processor results and through areference model to generate reference results; and comparing theprocessor results and the reference results to determine whether thesimulated processor operates correctly.
 2. The method of claim 1,wherein the multiple instruction streams access different groups ofregisters, and wherein the step of running includes the step of:executing the test executable such that each of the different groups ofregisters is accessed.
 3. A simulation system for testing a simulatedprocessor, comprising: an input that receives a test executable createdfrom a combined instruction stream having interleaved portions ofmultiple instruction streams; a processor simulator, coupled to theinput, that runs the test executable to generate processor results; areference model, coupled to the input, that runs the test executable togenerate reference results; and a compare module, coupled to theprocessor simulator and the reference model, that compares the processorresults and the reference results to determine whether the simulatedprocessor operates correctly.
 4. The simulation system of claim 3,wherein the multiple instruction streams access different groups ofregisters, and wherein the processor simulator includes: registers thatare accessed in the different groups when the processor simulator runsthe test executable.